An Efficient Hybrid Feature Selection Method based on Rough Set Theory for Short Text Representation

نویسندگان

  • Mohammed Bekkali
  • Abdelmonaime Lachkar
چکیده

With the rapid development of Internet and telecommunication industries, various forms of information such as short text which plays an important role in people's daily life. These short texts suffer from curse of dimensionality due to their sparse and noisy nature. Feature selection is a good way to solve this problem. Feature selection is a process that extracts a number of feature subsets which are the most representative of the original feature set; thus it becomes an important step in improving the performance of any Text Mining task. In this paper, a hybrid feature selection, based on Rough Set Theory (RST) which is a mathematical tool to deal with vagueness and uncertainty and Latent Semantic Analysis (LSA) which is a theory for extracting and representing the contextualusage meaning of words, is proposed in order to improve Arabic short text representation. The proposed method has been tested, evaluated and compared using an Arabic short text categorization system in term of the F1-measure. The experimental results show the interest of our proposition. Keywords—Arabic Language; short text; feature selection; Rough Set Theory; Latent Semantic Analysis

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

Sentiment Classification using Rough Set based Hybrid Feature Selection

Sentiment analysis means to extract opinion of users from review documents. Sentiment classification using Machine Learning (ML) methods faces the problem of high dimensionality of feature vector. Therefore, a feature selection method is required to eliminate the irrelevant and noisy features from the feature vector for efficient working of ML algorithms. Rough Set Theory based feature selectio...

متن کامل

A Knowledge-Based Feature Selection Method for Text Categorization

A major difficulty of text categorization is the high dimensionality of the original feature space. Feature selection plays an important role in text categorization. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization. Many existing experiments show IG is one of th...

متن کامل

A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram

Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...

متن کامل

Feature Subset Selection and Classification Using Hybrid Improved Svm

Many feature subset selection algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate feature subset selection algorithms for the problem at hand. Feature selection has become an essential element in the Data Mining process. In this paper, investigate the problem of effici...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016